Search results for "Search engine indexing"

showing 10 items of 56 documents

A Lack of Attribution: Closing the Citation Gap Through a Reform of Citation and Indexing Practices

2012

0106 biological sciences010506 paleontologybusiness.industrymedia_common.quotation_subjectClosing (real estate)Search engine indexingAccountingPlant Science010603 evolutionary biology01 natural sciencesGeographyCitationAttributionbusinessEcology Evolution Behavior and Systematics0105 earth and related environmental sciencesmedia_commonTAXON
researchProduct

Exceptional Pattern Discovery

2017

This chapter is devoted to a discussion on exceptional pattern discovery, namely on scenarios, contexts, and techniques concerning the mining of patterns which are so rare or so frequent to be considered as exceptional and, then, of interest for an expert to shed lights on the domain. Frequent patterns have found broad applications in areas like association rule mining, indexing, and clustering [1, 20, 23]. The application of frequent patterns in classification also achieved some success in the classification of relational data [6, 13, 14, 19, 25], text [15], and graphs [7]. The part is organized as follows. First, the frequent pattern mining on classical datasets is presented. This is not …

0301 basic medicineBiological dataPoint (typography)Association rule learningComputer scienceRelational databasebusiness.industrySearch engine indexingcomputer.software_genreDomain (software engineering)Network pattern03 medical and health sciences030104 developmental biology0302 clinical medicineArtificial intelligenceCluster analysisbusinesscomputer030217 neurology & neurosurgeryNatural language processing
researchProduct

Reverse-safe data structures for text indexing

2021

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optim…

050101 languages & linguisticsComputer sciencedata structure02 engineering and technologyprivacySet (abstract data type)combinatoric0202 electrical engineering electronic engineering information engineering0501 psychology and cognitive sciencesPattern matchingSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazionialgorithmSettore INF/01 - Informatica05 social sciencesSearch engine indexingINF/01 - INFORMATICAdata miningData structureMatrix multiplicationcombinatoricsExponent020201 artificial intelligence & image processingdata structure; algorithm; combinatorics; de Bruijn graph; data mining; privacyAlgorithmAdversary modelde Bruijn graphInteger (computer science)
researchProduct

Video preprocessing for audiovisual indexing

2003

We address the problem of detecting shots of subjects that are interviewed in news sequences. This is useful since usually these kinds of scenes contain important and reusable information that can be used for other news programs. In a previous paper, we presented a technique based on a priori knowledge of the editing techniques used in news sequences which allowed a fast search of news stories (see Albiol, A. et al., 3rd Int. Conf. on Audio and Video-based Biometric Person Authentication, p.366-71, 2001). We now present a new shot descriptor technique which improves the previous search results by using a simple, yet efficient, algorithm, based on the information contained in consecutive fra…

AuthenticationSequenceInformation retrievalContextual image classificationBiometricsComputer scienceSpeech recognitionSearch engine indexingcomputer.software_genreObject detectionReduction (complexity)Face (geometry)PreprocessorAudio signal processingcomputerImage retrievalIEEE International Conference on Acoustics Speech and Signal Processing
researchProduct

Textual data compression in computational biology: Algorithmic techniques

2012

Abstract In a recent review [R. Giancarlo, D. Scaturro, F. Utro, Textual data compression in computational biology: a synopsis, Bioinformatics 25 (2009) 1575–1586] the first systematic organization and presentation of the impact of textual data compression for the analysis of biological data has been given. Its main focus was on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used together with a technical presentation of how well-known notions from information theory have been adapted to successfully work on biological data. Rather surprisingly, the use of data compression is pervasive in computational biology. Starting from…

Biological dataData Compression Theory and Practice Alignment-free sequence comparison Entropy Huffman coding Hidden Markov Models Kolmogorov complexity Lempel–Ziv compressors Minimum Description Length principle Pattern discovery in bioinformatics Reverse engineering of biological networks Sequence alignmentSettore INF/01 - InformaticaGeneral Computer ScienceKolmogorov complexityComputer scienceSearch engine indexingComputational biologyInformation theoryInformation scienceTheoretical Computer ScienceTechnical PresentationEntropy (information theory)Data compressionComputer Science Review
researchProduct

ImageRover: A Content-Based Image Browser for the World Wide Web

1997

ImageRover is a search-by-image-content navigation tool for the World Wide Web (WWW). To gather images expediently, the image collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The image robots gather information about the images they find, computing the appropriate image decompositions and indices, and store this extracted information in vector form for searches based on image content. At search time, users can iteratively guide the search through the selection of relevant examples. Search performance is made efficient through the use of an approximate, optimized k-d tree algorithm. The system employs a novel relevance feedback algorithm that se…

CBIRInformation retrievalDistributed databasebusiness.industryComputer scienceSearch engine indexingRelevance feedbackcomputer.software_genreWorld Wide WebInformation extractionTree (data structure)RobotThe InternetbusinessImage retrievalcomputer
researchProduct

Languages with mismatches

2007

AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…

Combinatorics on wordsApproximate string matchingGeneral Computer ScienceRepetition (rhetorical device)String (computer science)Search engine indexingComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Approximate string matchingData structureTheoretical Computer ScienceCombinatoricsSet (abstract data type)Formal languagesCombinatorics on words Formal languages Approximate string matching IndexingIndexingWord (group theory)MathematicsInteger (computer science)Computer Science(all)Theoretical Computer Science
researchProduct

"Indexing structures for approximate string matching

2003

In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.

CombinatoricsCombinatorics on wordsPattern recognition (psychology)Search engine indexingAutomata theoryHamming distanceString searching algorithmApproximate string matchingTime complexityMathematics
researchProduct

Creating a semantically-enhanced cloud services environment through ontology evolution

2014

Currently, the availability of Web resources has grown enormously to the point that whatever a user needs at a given moment can potentially be found on the Internet. These resources are not limited to data items anymore, functionality delivered through some sort of service architectural model is also offered on the Internet. In the last few years, cloud computing has emerged as one of the most popular computing models to provide services over the Internet. However, as the number of available cloud services increases, the problem of service discovery and selection arises. Experience indicates that semantic technologies can provide the basis for enhanced and more precise search processes. In …

Computer Networks and CommunicationsComputer sciencecomputer.internet_protocolService discoveryCloud computingcomputer.software_genreSocial Semantic WebOWL-SWorld Wide WebSemantic computingSemantic analyticsUpper ontologySemantic Web StackSemantic WebInformation retrievalbusiness.industrySearch engine indexingSemantic searchInformation extractionSemantic gridHardware and ArchitectureOntologySemantic technologyThe InternetWeb resourcebusinesscomputerSoftwareFuture Generation Computer Systems
researchProduct

Indexing a sequence for mapping reads with a single mismatch

2014

Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the s…

Computer sciencegenome sequenceGeneral Mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]General Physics and AstronomyContext (language use)algorithmscomputer.software_genrePattern matchingSequenceSearch engine indexingGeneral EngineeringWildcard characterArticlescomputer.file_formatConstruct (python library)Data structuremapping readspattern matchingComputingMethodologies_DOCUMENTANDTEXTPROCESSINGData mining[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Focus (optics)mismatchcomputerAlgorithmindexingPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
researchProduct